ScALPEL: A Scalable Adaptive Lightweight Performance Evaluation Library for application performance monitoring

نویسندگان

  • Hari K. Pyla
  • Bharath Ramesh
  • Calvin J. Ribbens
  • Srinidhi Varadarajan
چکیده

As supercomputers continue to grow in scale and capabilities, it is becoming increasingly difficult to isolate processor and system level causes of performance degradation. Over the last several years, a significant number of performance analysis and monitoring tools have been built/proposed. However, these tools suffer from several important shortcomings, particularly in distributed environments. In this paper we present ScALPEL, a Scalable Adaptive Lightweight Performance Evaluation Library for application performance monitoring at the functional level. Our approach provides several distinct advantages. First, ScALPEL is portable across a wide variety of architectures, and its ability to selectively monitor functions presents low run-time overhead, enabling its use for large-scale production applications. Second, it is run-time configurable, enabling both dynamic selection of functions to profile as well as events of interest on a per function basis. Third, our approach is transparent in that it requires no source code modifications. Finally, ScALPEL is implemented as a pluggable unit by reusing existing performance monitoring frameworks such as Perfmon and PAPI and extending them to support both sequential and MPI applications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Evaluation of an Adaptive Generalized Likelihood Ratio Charts for Monitoring the Process Mean

When the objective is quick detection both small and large shifts in the process mean with normal distribution, the generalized likelihood ratio (GLR) control charts have better performance as compared to other control charts. Only the fixed parameters are used in Reynolds and Lou’s presented charts. According to the studies, using variable parameters, detect process shifts faster than fixed pa...

متن کامل

Intelligent Health Evaluation Method of Slewing Bearing Adopting Multiple Types of Signals from Monitoring System

Slewing bearing, which is widely applied in tank, excavator and wind turbine, is a critical component of rotational machine. Standard procedure for bearing life calculation and condition assessment was established in general rolling bearings, nevertheless, relatively less literatures, in regard to the health condition assessment of slewing bearing, were published in past. Real time health condi...

متن کامل

Libmonitor: A tool for first-party monitoring

Libmonitor is a library that provides hooks into a program and provides callback functions for monitoring the begin and end of processes and threads and maintains control across fork, exec and in the presence of signals. It provides a layer on which to build first-party profiling tools for performance or correctness. Libmonitor is lightweight, fully scalable, easy to use and does not require ac...

متن کامل

Falcon: On-line Monitoring for Steering Parallel Programs 1

Advances in high performance computing, communications, and user interfaces enable developers to construct increasingly interactive high performance applications. The Falcon system presented in this paper supports such interactivity by providing runtime libraries, tools, and user interfaces that permit the on-line monitoring and steering of large-scale parallel codes. The principal aspects of F...

متن کامل

A Review on Evaluation of Multilevel Checkpointing System in Distributed Environment

Nowadays there is need of high performance of computer system in distributed environment. As the system mean time before failure correspondingly drops, applications must checkpoint frequently to make progress. However, at scale, the cost of checkpointing becomes prohibitive. A solution to this problem is multilevel checkpointing, which employs multiple types of checkpoints in a single run. Ligh...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/0903.0035  شماره 

صفحات  -

تاریخ انتشار 2009